In this project, I have used the “cars_multi” and “cars_price” datasets. I have tried to understand how the varibles of these datasets are relate to each other, to uncover interesting things, and to communicate those findings. I’m going to focus on the correlation between mpg and the other properties.

I am going to use the following R libraries to assist in my analysis:

library(ggplot2) require(GGally) require(ggthemes) require(plotly) require(dplyr) require(heatmaply) require(ggcorrplot)

library(ggplot2)
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
require(ggthemes)
## Loading required package: ggthemes
require(plotly)
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(heatmaply)
## Loading required package: heatmaply
## Warning: package 'heatmaply' was built under R version 4.0.5
## Loading required package: viridis
## Loading required package: viridisLite
## 
## ======================
## Welcome to heatmaply version 1.2.1
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================
require(ggcorrplot)
## Loading required package: ggcorrplot
## Warning: package 'ggcorrplot' was built under R version 4.0.5
cars_multi <- read.csv("cars_multi.csv")
cars_price <- read.csv("cars_price.csv")
cars <- left_join(cars_multi, cars_price, by="ID")
model_years = sort(unique(cars$model))
cars$model = cars$model %>%
  factor(labels = model_years)
origins <- c('USA', 'Europe', 'Japan')
cars$origin <- factor(cars$origin, labels = origins)
str(cars)
## 'data.frame':    398 obs. of  11 variables:
##  $ ID          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : chr  "130" "165" "150" "150" ...
##  $ weight      : int  3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ model       : Factor w/ 13 levels "70","71","72",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ origin      : Factor w/ 3 levels "USA","Europe",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ car_name    : chr  "chevrolet chevelle malibu" "buick skylark 320" "plymouth satellite" "amc rebel sst" ...
##  $ price       : num  25562 24221 27241 33685 20000 ...
cars$horsepower <- as.numeric(cars$horsepower)
## Warning: NAs introduced by coercion
summary(cars)
##        ID             mpg          cylinders      displacement  
##  Min.   :  1.0   Min.   : 9.00   Min.   :3.000   Min.   : 68.0  
##  1st Qu.:100.2   1st Qu.:17.50   1st Qu.:4.000   1st Qu.:104.2  
##  Median :199.5   Median :23.00   Median :4.000   Median :148.5  
##  Mean   :199.5   Mean   :23.51   Mean   :5.455   Mean   :193.4  
##  3rd Qu.:298.8   3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:262.0  
##  Max.   :398.0   Max.   :46.60   Max.   :8.000   Max.   :455.0  
##                                                                 
##    horsepower        weight      acceleration       model        origin   
##  Min.   : 46.0   Min.   :1613   Min.   : 8.00   73     : 40   USA   :249  
##  1st Qu.: 75.0   1st Qu.:2224   1st Qu.:13.82   78     : 36   Europe: 70  
##  Median : 93.5   Median :2804   Median :15.50   76     : 34   Japan : 79  
##  Mean   :104.5   Mean   :2970   Mean   :15.57   82     : 31               
##  3rd Qu.:126.0   3rd Qu.:3608   3rd Qu.:17.18   75     : 30               
##  Max.   :230.0   Max.   :5140   Max.   :24.80   70     : 29               
##  NA's   :6                                      (Other):198               
##    car_name             price      
##  Length:398         Min.   : 1598  
##  Class :character   1st Qu.:23110  
##  Mode  :character   Median :30000  
##                     Mean   :29684  
##                     3rd Qu.:36430  
##                     Max.   :53746  
## 

##Univariate Plots- weight

The most common weight is something between 2000 and 3000, and there only one unique weight for the majority of the cars

##Correlation

At this plot we can see the correlation between all features.

Multivariate Plots

This section includes charts that involve three or more variables simultaneously, to give us a more complete look at the questions that presented themselves in the previous sections. Building on the observation in the previous plot, I want to see how each region’s product mix has evolved over time. The best way to illustrate this is with a stacked bar chart over time for each region.

As shown in the top section, while the number of four-cylinder cars increases over time, six- and eight-cylinder cars comprise the majority of the United States’ product mix until 1980. Europe and Japan almost exclusively produce four-cylinder cars with just a few exceptions over the entire 13-year period. We can see this phenomenon illustrated when we compare each region’s weight distributions per year using boxplots.

As we can see, US cars show much higher average weights than Europe and, especially, Japan, until about 1980, when US weight distribution comes down considerably. From above we know that 1980 is when the US converted to a higher percentage of four-cylinder cars. Note that average weights stay more constant for Europe and Japan over the same time period.

Now we can create a similar comparative boxplot for MPG over time.

The average MPG for each region shows an upward trend, especially towards the end of the 70’s and into the early 80’s. Since Europe and Japan also increased MPG, it is apparent that increasing overall fuel economy was not solely about changing the product mix away from six- and eight-cylinder cars. Indeed, the fuel economy of four-cylinder cars increased over time. We can see that more clearly by restricting our analysis to include only four-cylinder cars.

Conclusions

The weight of a car is a strong determinant of its fuel-efficiency, as expressed by MPG. Four-cylinder cars are the lightest, and eight-cylinder cars are the heaviest. Therefore, four-cylinder cars get the best gas mileage.